Skip to content

feat(mcp): add list_cloud_workspaces and describe_cloud_organization tools#883

Merged
Aaron ("AJ") Steers (aaronsteers) merged 4 commits intomainfrom
devin/1764630057-org-workspace-mcp-tools
Dec 2, 2025
Merged

feat(mcp): add list_cloud_workspaces and describe_cloud_organization tools#883
Aaron ("AJ") Steers (aaronsteers) merged 4 commits intomainfrom
devin/1764630057-org-workspace-mcp-tools

Conversation

@aaronsteers
Copy link
Copy Markdown
Member

@aaronsteers Aaron ("AJ") Steers (aaronsteers) commented Dec 1, 2025

feat(mcp): add list_cloud_workspaces and describe_cloud_organization tools

Summary

Adds two new MCP tools for organization-scoped workspace discovery, addressing the safety concern that LLM agents could accidentally access workspaces across organizations when using instance admin credentials.

New MCP Tools:

  • list_cloud_workspaces: Lists workspaces within a specific organization, requiring either organization_id or organization_name (exact match). Supports name_contains filtering and max_items_limit.
  • describe_cloud_organization: Gets details about a specific organization, useful for looking up org ID from name or vice versa.

New api_util functions:

  • list_organizations_for_user: Uses public API GET /organizations to list orgs accessible to current user
  • list_workspaces_in_organization: Uses internal Config API POST /v1/workspaces/list_by_organization_id with server-side name filtering and pagination

Design decisions:

  • No list_organizations tool is exposed to prevent agents from enumerating orgs and picking "close matches"
  • Organization name matching is case-sensitive and exact-match only
  • Both tools require explicit organization context (ID or exact name)

Review & Testing Checklist for Human

  • Verify Config API endpoint: The workspace listing uses internal Config API (/v1/workspaces/list_by_organization_id). Confirm this endpoint exists and the request/response format matches the implementation.
  • Test with real credentials: These tools have not been tested against a real Airbyte Cloud instance. Test both tools with valid credentials to verify they work end-to-end.
  • Verify pagination: Test with an organization that has many workspaces (>100) to confirm pagination works correctly.
  • Check workspace response fields: The code accesses workspaceId, name, organizationId from the API response via .get(). Verify these field names match the actual API response.

Recommended test plan:

  1. Set up MCP server with valid Airbyte Cloud credentials
  2. Call describe_cloud_organization with a known org name → verify it returns correct org ID
  3. Call list_cloud_workspaces with that org ID → verify it returns expected workspaces
  4. Call list_cloud_workspaces with name_contains filter → verify filtering works
  5. Test error cases: invalid org name, missing parameters, etc.

Notes

Summary by CodeRabbit

  • New Features
    • Add API utilities to list a user’s organizations and to enumerate workspaces within an organization, with optional name filtering, pagination and item limits.
    • Add cloud organization lookup by ID or exact name with clear not-found and ambiguity handling.
    • Expose cloud workspace listing for organizations to support discovery and tooling integration.

✏️ Tip: You can customize this high-level summary in your review settings.

…tools

Add two new MCP tools for organization-scoped workspace discovery:

- list_cloud_workspaces: Lists workspaces within a specific organization,
  requiring either organization_id or organization_name (exact match).
  Supports name_contains filtering and max_items_limit.

- describe_cloud_organization: Gets details about a specific organization,
  useful for looking up org ID from name or vice versa.

These tools enable safe workspace discovery by requiring explicit
organization context, preventing accidental access to workspaces
across organizations.

Also adds supporting api_util functions:
- list_organizations_for_user: Lists orgs accessible to current user
- list_workspaces_in_organization: Lists workspaces in a specific org

Co-Authored-By: AJ Steers <aj@airbyte.io>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

Original prompt from AJ Steers
SYSTEM:
=== BEGIN THREAD HISTORY (in #hydra-pyairbyte-mcp) ===
Aldo Alberto Jesus Gonzalez Campos (U070BMPDUHJ): :octavia-hello: Is the <https://airbytehq.github.io/PyAirbyte/airbyte/mcp.html#getting-started-with-pyairbyte-mcp|pyairbyte-mcp> the one we address as coral-mcp?

AJ Steers (U05AKF1BCC9): Yes, that's correct.

AJ Steers (U05AKF1BCC9): Do you want install/setup instructions? I can link you to those docs if helpful.

AJ Steers (U05AKF1BCC9): Essentially `uvx --from=airbyte airbyte-mcp`

AJ Steers (U05AKF1BCC9): Btw, are we ever going to use "Coral" as a user-facing dilineation? (Rhetorical, no answer needed. :sweat_smile: )

Aldo Alberto Jesus Gonzalez Campos (U070BMPDUHJ): &gt;  Essentially `uvx --from=airbyte airbyte-mcp`
Link would be cool anyway

Aldo Alberto Jesus Gonzalez Campos (U070BMPDUHJ): Also, what is the repo this mcp?

Aldo Alberto Jesus Gonzalez Campos (U070BMPDUHJ): Oh wait, is it part of <https://github.com/airbytehq/PyAirbyte/tree/main/airbyte/mcp|pyaribyte> repo?

AJ Steers (U05AKF1BCC9): Yes, that's correct.

AJ Steers (U05AKF1BCC9): <https://airbytehq.github.io/PyAirbyte/airbyte/mcp.html|https://airbytehq.github.io/PyAirbyte/airbyte/mcp.html>

AJ Steers (U05AKF1BCC9): <@U070BMPDUHJ> - Did you get what you needed here?

Aldo Alberto Jesus Gonzalez Campos (U070BMPDUHJ): Got distracted with something else, let me check if I can achieve the bare minimum of having pyairbyte mcp running with claude for my test org

AJ Steers (U05AKF1BCC9): It should work for most/all use cases. Please let me know if you run into issues, as that feedback will help make it better for the next person. :thanks-keanu:

Aldo Alberto Jesus Gonzalez Campos (U070BMPDUHJ): I don't understand this var `AIRBYTE_PROJECT_DIR`:
~/.mcp/airbyte_mcp.env
```# Airbyte Project Artifacts Directory
AIRBYTE_PROJECT_DIR=/path/to/any/writeable/project-dir```
Should that point to a local airbyte repo, PyAirbyte repo, or something else?

Aldo Alberto Jesus Gonzalez Campos (U070BMPDUHJ):... (7001 chars truncated...)

@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 1, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1764630057-org-workspace-mcp-tools' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1764630057-org-workspace-mcp-tools'

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test-pr - Runs tests with the updated PyAirbyte

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 1, 2025

📝 Walkthrough

Walkthrough

Adds two API utilities to list organizations and page workspaces, and MCP cloud tools/models to resolve organizations and list/describe cloud workspaces by ID or exact name.

Changes

Cohort / File(s) Summary
Core API utilities
airbyte/_util/api_util.py
Added list_organizations_for_user() to call the public organizations endpoint with error handling, and list_workspaces_in_organization() to page and aggregate workspaces for an organization with optional name filtering and max result limits.
Cloud operations MCP tools & models
airbyte/mcp/cloud_ops.py
Added CloudOrganizationResult and CloudWorkspaceResult models; introduced _resolve_organization() / _resolve_organization_id() helpers for exact-name or ID resolution with ambiguity/missing handling; added describe_cloud_organization() and list_cloud_workspaces() MCP tools; updated imports to use centralized exception types and SecretString for credentials.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant MCP as MCP Client
    participant Tool as describe_cloud_organization()
    participant Resolver as _resolve_organization_id()
    participant APIUtil as api_util.list_organizations_for_user()
    participant API as Airbyte API

    User->>MCP: request org details (id or name)
    MCP->>Tool: describe_cloud_organization(params)
    Tool->>Resolver: resolve by id or exact name
    Resolver->>APIUtil: list_organizations_for_user(api_root, client_id, client_secret)
    APIUtil->>API: GET /api/v1/organizations
    API-->>APIUtil: organizations list
    APIUtil-->>Resolver: organizations
    Resolver-->>Tool: resolved organization_id
    Tool->>Tool: build CloudOrganizationResult
    Tool-->>MCP: CloudOrganizationResult
    MCP-->>User: {id, name, email}
Loading
sequenceDiagram
    actor User
    participant MCP as MCP Client
    participant Tool as list_cloud_workspaces()
    participant Resolver as _resolve_organization_id()
    participant APIUtil as api_util.list_workspaces_in_organization()
    participant API as Airbyte Config API

    User->>MCP: request workspaces for org (id or name)
    MCP->>Tool: list_cloud_workspaces(params)
    Tool->>Resolver: resolve to organization_id
    Resolver->>APIUtil: list_workspaces_in_organization(organization_id,...)
    loop pagination
        APIUtil->>API: POST /api/v1/config/workspaces (offset/limit/nameContains)
        API-->>APIUtil: workspace page
        APIUtil-->>Tool: workspace items
    end
    Tool->>Tool: map to CloudWorkspaceResult list
    Tool-->>MCP: [CloudWorkspaceResult]
    MCP-->>User: [{id, name, organization_id}, ...]
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review pagination logic and stop conditions in list_workspaces_in_organization() (offset/limit, max_items_limit enforcement).
  • Verify _resolve_organization_id() behavior for exact-match vs ambiguous matches and error types raised.
  • Check consistency of exception types/propagation between api_util (raises AirbyteError) and MCP tools (use PyAirbyteInputError / AirbyteMissingResourceError).

Suggested reviewers

  • aldogonzalez8

Could you take a look at the pagination and resolution edge cases above, wdyt?

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding two new MCP tools (list_cloud_workspaces and describe_cloud_organization) for cloud operations.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1764630057-org-workspace-mcp-tools

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95ed658 and 0e133e4.

📒 Files selected for processing (1)
  • airbyte/_util/api_util.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte/_util/api_util.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (Fast)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
airbyte/mcp/cloud_ops.py (3)

925-933: Consider restructuring error messages to decouple dynamic values.

Based on learnings, error messages in PyAirbyte follow a structlog-inspired design where dynamic values (especially potential PII like organization names) are provided via the context parameter rather than embedded in the message string. This helps avoid including PII in telemetry.

Would you consider restructuring these errors? For example:

     if not matching_orgs:
         raise AirbyteMissingResourceError(
             resource_type="organization",
             context={
                 "organization_name": organization_name,
-                "message": f"No organization found with exact name '{organization_name}' "
-                "for the current user.",
             },
+            message="No organization found with exact name for the current user.",
         )
 
     if len(matching_orgs) > 1:
         raise PyAirbyteInputError(
-            message=f"Multiple organizations found with name '{organization_name}'. "
-            "Please use 'organization_id' instead to specify the exact organization."
+            message="Multiple organizations found with the specified name. "
+            "Please use 'organization_id' instead to specify the exact organization.",
+            context={"organization_name": organization_name},
         )

Based on learnings, this pattern helps prevent PII leakage in telemetry. Wdyt?

Also applies to: 935-939


1005-1012: Consider whether empty string defaults are appropriate here.

Using .get() with empty string defaults ("") means that if the API response is missing workspaceId, name, or organizationId fields, we'll create result objects with empty strings rather than failing fast or using None. This could mask data quality issues.

Would it be better to either:

  1. Let it fail if required fields are missing (don't use defaults)
  2. Use explicit validation to ensure required fields exist
  3. Use None as the default if these fields can legitimately be absent

Wdyt? If these fields are always expected from the API, option 1 might be clearest.


1058-1090: Consider reusing _resolve_organization_id to avoid duplication.

This function duplicates the organization lookup logic from _resolve_organization_id (lines 878-942). The only difference is that this function needs to return the full organization object, while _resolve_organization_id returns just the ID.

What if you refactored to reuse the helper? For example:

@mcp_tool(...)
def describe_cloud_organization(
    *,
    organization_id: ...,
    organization_name: ...,
) -> CloudOrganizationResult:
    """Get details about a specific organization."""
    api_root = resolve_cloud_api_url()
    client_id = SecretString(resolve_cloud_client_id())
    client_secret = SecretString(resolve_cloud_client_secret())
    
    # Get all organizations
    orgs = api_util.list_organizations_for_user(
        api_root=api_root,
        client_id=client_id,
        client_secret=client_secret,
    )
    
    # Find the matching one
    if organization_id:
        matching_orgs = [org for org in orgs if org.organization_id == organization_id]
        search_key = "organization_id"
        search_value = organization_id
    elif organization_name:
        matching_orgs = [org for org in orgs if org.organization_name == organization_name]
        search_key = "organization_name"
        search_value = organization_name
    else:
        raise PyAirbyteInputError(
            message="Either 'organization_id' or 'organization_name' must be provided."
        )
    
    if not matching_orgs:
        raise AirbyteMissingResourceError(
            resource_type="organization",
            context={
                search_key: search_value,
            },
            message="No organization found for the current user.",
        )
    
    if len(matching_orgs) > 1:
        raise PyAirbyteInputError(
            message="Multiple organizations found. Please use 'organization_id' instead.",
            context={search_key: search_value},
        )
    
    org = matching_orgs[0]
    return CloudOrganizationResult(
        id=org.organization_id,
        name=org.organization_name,
        email=org.email,
    )

This also addresses the error message structure concerns by moving dynamic values to context. Wdyt?

airbyte/_util/api_util.py (1)

1646-1655: Consider adding validation for the API response structure.

The function assumes json_result.get("workspaces", []) will work, but if the API response structure changes or errors occur, this might mask issues. The function relies entirely on _make_config_api_request to raise errors.

Would it be worth adding a quick validation check? Something like:

        json_result = _make_config_api_request(
            path="/workspaces/list_by_organization_id",
            json=payload,
            api_root=api_root,
            client_id=client_id,
            client_secret=client_secret,
        )
        
        if "workspaces" not in json_result:
            raise AirbyteError(
                message="Unexpected API response structure: missing 'workspaces' field",
                context={
                    "organization_id": organization_id,
                    "response_keys": list(json_result.keys()),
                },
            )

        workspaces = json_result["workspaces"]

This would make debugging easier if the API changes. That said, it's a fairly defensive approach - the current implementation is reasonable too. Wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef69ae0 and b42a3a5.

📒 Files selected for processing (2)
  • airbyte/_util/api_util.py (1 hunks)
  • airbyte/mcp/cloud_ops.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-06T23:44:31.534Z
Learnt from: aaronsteers
Repo: airbytehq/PyAirbyte PR: 411
File: airbyte/cli.py:111-160
Timestamp: 2024-10-06T23:44:31.534Z
Learning: In PyAirbyte, error messages in functions like `_resolve_source_job` in `airbyte/cli.py` are designed to decouple the message text from dynamic values, following a structlog-inspired design. Dynamic values are provided via parameters like `input_value`. This approach helps avoid including PII in the message strings, which may be used in telemetry.

Applied to files:

  • airbyte/mcp/cloud_ops.py
🧬 Code graph analysis (1)
airbyte/_util/api_util.py (2)
airbyte/secrets/base.py (1)
  • SecretString (38-143)
airbyte/exceptions.py (1)
  • AirbyteError (432-447)
🪛 GitHub Actions: Run Linters
airbyte/mcp/cloud_ops.py

[warning] 594-594: Redundant cast: str is the same type as str [redundant-cast]


[warning] 643-643: Redundant cast: str is the same type as str [redundant-cast]


[error] 918-918: Argument str is not assignable to parameter client_id with type SecretString in function airbyte._util.api_util.list_organizations_for_user


[error] 919-919: Argument str is not assignable to parameter client_secret with type SecretString in function airbyte._util.api_util.list_organizations_for_user

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
airbyte/_util/api_util.py (2)

1570-1605: LGTM! Function signature correctly uses SecretString types.

The function signature properly expects SecretString for client_id and client_secret (lines 1573-1574), which is the correct type. The type errors flagged by the pipeline are in the calling code (cloud_ops.py), not here. The error handling also looks appropriate.


1607-1668: Nice pagination implementation!

The pagination logic correctly handles both max_items_limit (lines 1658-1660) and the end-of-results condition (lines 1662-1664). The function properly builds up results incrementally and stops when appropriate.

Update _resolve_organization_id helper function to use SecretString
type hints for client_id and client_secret parameters, matching the
return types of resolve_cloud_client_id() and resolve_cloud_client_secret().

This fixes pyrefly type check errors where str was being passed to
functions expecting SecretString.

Co-Authored-By: AJ Steers <aj@airbyte.io>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
airbyte/mcp/cloud_ops.py (3)

917-921: Type mismatch issue may still be present.

Previous review comments flagged that resolve_cloud_client_id() and resolve_cloud_client_secret() return str rather than SecretString, but this function expects SecretString parameters. Based on the relevant code snippets, these functions should return SecretString, but if the pipeline is still failing with type errors, you may need to explicitly wrap the return values.

The same issue applies to the callers at lines 989-995 and 1052-1056.

Could you verify whether the pipeline failures are resolved? If not, you might need to wrap the credentials like:

     orgs = api_util.list_organizations_for_user(
         api_root=api_root,
-        client_id=client_id,
-        client_secret=client_secret,
+        client_id=SecretString(client_id) if isinstance(client_id, str) else client_id,
+        client_secret=SecretString(client_secret) if isinstance(client_secret, str) else client_secret,
     )

What do you think?


989-1004: Type mismatch propagates here as well.

Similar to the issue in _resolve_organization_id, these calls pass credentials that may need SecretString wrapping. This duplicates the concerns from the previous review.

The same wrapping approach would apply here. Could you confirm if the pipeline failures are resolved?


1052-1056: Type mismatch here as well.

Same issue as the previous calls: credentials may need SecretString wrapping per the previous review comments.

The same fix would apply here, wdyt?

🧹 Nitpick comments (2)
airbyte/mcp/cloud_ops.py (2)

904-940: Consider structlog-inspired error message design for consistency.

Based on learnings from the codebase, PyAirbyte error messages typically decouple the message text from dynamic values to avoid including potential PII in telemetry. Organization names and IDs might be considered sensitive.

Would you consider refactoring these error messages to use the context parameter for dynamic values? For example:

raise PyAirbyteInputError(
    message="Either 'organization_id' or 'organization_name' must be provided.",
    context={
        "organization_id": organization_id,
        "organization_name": organization_name,
    },
)

This pattern is used elsewhere in the codebase (see lines 1064-1069 and 1076-1081 in describe_cloud_organization), wdyt?

Based on learnings, PyAirbyte prefers decoupling message text from dynamic values to avoid PII in telemetry strings.


1059-1091: Consider reducing code duplication with _resolve_organization_id.

The logic here for resolving an organization by ID or name is very similar to what _resolve_organization_id does (lines 879-943). The main difference is that this function returns the full organization object while _resolve_organization_id returns just the ID.

Would you consider refactoring to call _resolve_organization_id first to get the ID, then look up the full org from the list? This would reduce duplication and make the validation logic consistent. For example:

# Get all organizations for the user
orgs = api_util.list_organizations_for_user(
    api_root=api_root,
    client_id=client_id,
    client_secret=client_secret,
)

# Resolve to ID using shared helper
resolved_org_id = _resolve_organization_id(
    organization_id=organization_id,
    organization_name=organization_name,
    api_root=api_root,
    client_id=client_id,
    client_secret=client_secret,
)

# Find the full org object
org = next((o for o in orgs if o.organization_id == resolved_org_id), None)
if not org:
    raise AirbyteMissingResourceError(...)

This is just a suggested refactor to improve maintainability, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b42a3a5 and 22eca44.

📒 Files selected for processing (1)
  • airbyte/mcp/cloud_ops.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-06T23:44:31.534Z
Learnt from: aaronsteers
Repo: airbytehq/PyAirbyte PR: 411
File: airbyte/cli.py:111-160
Timestamp: 2024-10-06T23:44:31.534Z
Learning: In PyAirbyte, error messages in functions like `_resolve_source_job` in `airbyte/cli.py` are designed to decouple the message text from dynamic values, following a structlog-inspired design. Dynamic values are provided via parameters like `input_value`. This approach helps avoid including PII in the message strings, which may be used in telemetry.

Applied to files:

  • airbyte/mcp/cloud_ops.py
🧬 Code graph analysis (1)
airbyte/mcp/cloud_ops.py (4)
airbyte/exceptions.py (2)
  • AirbyteMissingResourceError (505-509)
  • PyAirbyteInputError (201-210)
airbyte/secrets/base.py (1)
  • SecretString (38-143)
airbyte/_util/api_util.py (2)
  • list_organizations_for_user (1570-1604)
  • list_workspaces_in_organization (1607-1668)
airbyte/cloud/auth.py (3)
  • resolve_cloud_api_url (25-33)
  • resolve_cloud_client_id (17-22)
  • resolve_cloud_client_secret (9-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (No Creds)
🔇 Additional comments (3)
airbyte/mcp/cloud_ops.py (3)

11-11: LGTM on the new imports!

The addition of api_util, the exception types, and SecretString are all appropriate for the new organization/workspace discovery features.

Also applies to: 21-21, 29-29


126-146: LGTM on the new result models!

CloudOrganizationResult and CloudWorkspaceResult are well-structured with clear documentation. They follow the existing patterns in the codebase nicely.


1006-1013: Should missing workspace fields fail gracefully or raise errors?

Using .get() with empty string defaults means that if the API returns a workspace without workspaceId, name, or organizationId, you'll create a result object with empty strings rather than failing. Is this the intended behavior?

If these fields are required, consider either:

  1. Using direct dictionary access (e.g., ws["workspaceId"]) to fail fast if the field is missing
  2. Adding validation after construction to ensure no empty strings

What's your preference here? Should we validate the API response more strictly, wdyt?

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 1, 2025

PyTest Results (Fast Tests Only, No Creds)

320 tests  ±0   320 ✅ ±0   5m 52s ⏱️ -4s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 0e133e4. ± Comparison against base commit ef69ae0.

♻️ This comment has been updated with latest results.

…cloud_organization

- Create _resolve_organization() that returns full OrganizationResponse object
- Simplify _resolve_organization_id() to call _resolve_organization and extract ID
- Refactor describe_cloud_organization() to use _resolve_organization (addresses AJ's comment)
- Refactor list_workspaces_in_organization() to use explicit has_more_pages flag instead of while True

Co-Authored-By: AJ Steers <aj@airbyte.io>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
airbyte/_util/api_util.py (1)

1607-1669: Workspace pagination logic looks good; consider reusing the bearer token for many-page orgs?

The paging loop, name_contains wiring, and max_items_limit behavior all look solid and address the earlier “no while True” feedback nicely. One minor thing: each call to _make_config_api_request re-does the bearer-token exchange, which is fine for typical “few-page” orgs but could add noticeable overhead if an admin ever has hundreds of workspaces.

Would you consider a follow-up that lets _make_config_api_request optionally accept a pre-fetched bearer token (so a single token can be reused across pages in this loop), or do you prefer to keep the simpler API and only revisit if this becomes a hot path, wdyt?

airbyte/mcp/cloud_ops.py (4)

126-146: New CloudOrganizationResult / CloudWorkspaceResult models look clean; should org email allow None?

The shapes for CloudOrganizationResult and CloudWorkspaceResult line up nicely with the underlying API fields and give MCP callers a simple, stable contract. One small question: depending on the Airbyte API schema, OrganizationResponse.email might be nullable in some edge cases; if that’s possible, would it be safer to declare email: str | None on CloudOrganizationResult (or default to "") to avoid surprises when pydantic validates a None, wdyt?


879-953: Org resolution logic is solid; consider aligning error messages with the “no PII in message text” convention.

The overall flow here is great: you enforce “either id or name, but not both”, resolve against the current user’s orgs via list_organizations_for_user, and handle both not-found and ambiguous-name cases with the right exception types.

Given the prior guidance about keeping error messages static and putting dynamic values into context fields, would you consider:

  • For the “multiple organizations found” branch, using a static message (e.g. "Multiple organizations found with this name. Please provide 'organization_id' instead.") and passing organization_name in context or input_value instead of interpolating it directly into the message string?
  • Similarly, for the not-found-by-name/ID cases, keeping any user- or org-specific identifiers out of the human-readable text and relying on context keys like "organization_id" / "organization_name"?

That would keep this consistent with the structlog-style pattern you’ve used elsewhere and avoid PII in top-level messages, wdyt?


985-1047: list_cloud_workspaces correctly enforces org scoping; would adding workspace URL/slug be useful to callers?

This tool seems to hit all the design goals:

  • It requires explicit organization context (ID or exact, case-sensitive name) and delegates validation to _resolve_organization_id.
  • It uses the new list_workspaces_in_organization helper so the Config API does the heavy lifting, including server-side name_contains filtering.
  • It returns a minimal, well-typed CloudWorkspaceResult list that’s easy for agents to consume.

One question: since the underlying payload includes fields like slug and (presumably) a way to construct the workspace web URL, would it be worth exposing either slug and/or a url field on CloudWorkspaceResult to make it easier for tools/agents to surface links without an extra lookup, or are you intentionally keeping this result schema minimal for now, wdyt?


1056-1096: describe_cloud_organization nicely reuses the resolver; just double-check email is always non-null.

Using _resolve_organization here keeps the “exact name vs ID” behavior, ambiguity checks, and not-found handling perfectly in sync with list_cloud_workspaces, which is great for predictability. Returning a compact CloudOrganizationResult with id, name, and email seems exactly in line with the “name ↔ ID lookup” use case.

Similar to the earlier model comment: if OrganizationResponse.email can ever be None, should this tool either (a) type email as str | None or (b) normalize None to "" before constructing CloudOrganizationResult, to avoid subtle validation/runtime issues, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22eca44 and 95ed658.

📒 Files selected for processing (2)
  • airbyte/_util/api_util.py (1 hunks)
  • airbyte/mcp/cloud_ops.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-06T23:44:31.534Z
Learnt from: aaronsteers
Repo: airbytehq/PyAirbyte PR: 411
File: airbyte/cli.py:111-160
Timestamp: 2024-10-06T23:44:31.534Z
Learning: In PyAirbyte, error messages in functions like `_resolve_source_job` in `airbyte/cli.py` are designed to decouple the message text from dynamic values, following a structlog-inspired design. Dynamic values are provided via parameters like `input_value`. This approach helps avoid including PII in the message strings, which may be used in telemetry.

Applied to files:

  • airbyte/mcp/cloud_ops.py
🧬 Code graph analysis (2)
airbyte/_util/api_util.py (2)
airbyte/secrets/base.py (1)
  • SecretString (38-143)
airbyte/exceptions.py (1)
  • AirbyteError (432-447)
airbyte/mcp/cloud_ops.py (4)
airbyte/exceptions.py (2)
  • AirbyteMissingResourceError (505-509)
  • PyAirbyteInputError (201-210)
airbyte/secrets/base.py (1)
  • SecretString (38-143)
airbyte/_util/api_util.py (2)
  • list_organizations_for_user (1570-1604)
  • list_workspaces_in_organization (1607-1669)
airbyte/cloud/auth.py (3)
  • resolve_cloud_api_url (25-33)
  • resolve_cloud_client_id (17-22)
  • resolve_cloud_client_secret (9-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (No Creds)
🔇 Additional comments (2)
airbyte/_util/api_util.py (1)

1570-1604: Org listing helper looks correct and aligned with the public API.

The wrapper cleanly delegates to organizations.list_organizations_for_user(), checks for a successful response, and raises a rich AirbyteError otherwise; this feels consistent with the rest of api_util’s patterns, and not adding pagination here matches the endpoint’s current contract. Looks good to me as‑is.

airbyte/mcp/cloud_ops.py (1)

956-975: Convenience wrapper _resolve_organization_id is a nice narrow helper.

Wrapping _resolve_organization to return just the ID keeps the public tools focused on IDs without duplicating resolution logic, and centralizes all ambiguity/not-found behavior in one place. This reads well and matches the PR’s intent.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 1, 2025

PyTest Results (Full)

388 tests  ±0   371 ✅  - 1   25m 53s ⏱️ + 1m 8s
  1 suites ±0    16 💤 ±0 
  1 files   ±0     1 ❌ +1 

For more details on these failures, see this check.

Results for commit 0e133e4. ± Comparison against base commit ef69ae0.

♻️ This comment has been updated with latest results.

…ge size

- Add explicit check for empty results before extending result list
- Exit early if no workspaces returned (handles out-of-range offset case)
- Keep existing check for partial page to avoid unnecessary API calls

Co-Authored-By: AJ Steers <aj@airbyte.io>
@aaronsteers Aaron ("AJ") Steers (aaronsteers) merged commit 2d495ba into main Dec 2, 2025
21 of 23 checks passed
@aaronsteers Aaron ("AJ") Steers (aaronsteers) deleted the devin/1764630057-org-workspace-mcp-tools branch December 2, 2025 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant